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In this paper, the authors develop a method of detecting correlations between epidemic patterns in different 
regions that are due to human movement and introduce a null model in which the travel-induced correlations 
are cancelled. They apply this method to the well-documented cases of seasonal influenza outbreaks in the 
United States and France. In the United States (using data for 1972-2002), the authors observed strong short- 
range correlations between several states and their immediate neighbors, as well as robust long-range spreading 
patterns resulting from large domestic air-traffic flows. The stability of these results over time allowed the 
authors to draw conclusions about the possible impact of travel restrictions on epidemic spread. The authors 
also applied this method to the case of France (1984-2004) and found that on the regional scale, there was no 
transportation mode that clearly dominated disease spread. The simplicity and robustness of this method suggest 
that it could be a useful tool for detecting transmission channels in the spread of epidemics. 



PACS numbers: 89.75.Hc, 87.23.Ge, 87.19.Xx 

Understanding quantitatively how a disease spreads in mod- 
ern society is a crucial issue. In particular, the high probability 
of occurrence of the next influenza pandemic raises interest 
in the design of efficient containment policies fl] E] [3] [4] |5) 
and necessitates an accurate characterization of spatiotem- 
poral epidemic patterns. Recent outbreaks of highly com- 
municable diseases (6j |TJ [8] |9] [10] E] have triggered a se- 
ries of studies on the mechanisms of global disease spread 
El E] [14) , and other studies have addressed the issue 
at the level of individual countries |fl] El Q2J Q2]. In all of 
these studies, the travel and movement of individuals is a cru- 
cial point El [H US El QS [19). It is of the highest im- 
portance for control strategies to identify the main channels 
of transmission or "epidemic pathways", if any JT4). Indeed, 
identifying such pathways provides a first hint on how to con- 
trol a disease's spread. Even if, in most cases, travel restric- 
tions are economically unrealistic, knowing the most impor- 
tant transmission channels could help to slow down the epi- 
demic through the use of selective travel restrictions. In ad- 
dition to epidemics of emergent diseases, recurrent influenza 
epidemics are a burden for societies located in temperate ar- 
eas. They affect approximately 5-15 percent of the population 
worldwide and are responsible for 250,000-500,000 deaths 
annually E0l . Different influenza surveillance systems have 
been set up in various parts of the world ETll22ll23l . and in- 
fluenza has been tracked for a long time (the World Health 
Organization Global Influenza Surveillance Network was es- 
tablished in 1952), which makes it a well-documented dis- 
ease. These data provide an important ground for testing and 
developing strategies and for uncovering the main transmis- 
sion mechanisms of a disease at different scales. Indeed, the 
spread of influenza has been the focus of many studies for 
several years, and a number of models have been proposed 
to describe and understand it E] EH [25] [26] E3 . Other in- 



vestigators have thoroughly described the spatial distribution 
of influenza spread [28 29] and of annual waves of infection 
in the United States lfT31 [TBI . Because of the complexity of 
epidemic processes, together with the "noise" present in data, 
innovative statistical analyses need to be developed for detect- 
ing patterns. For example, using signal processing methods, 
Brownstein et al. lfT6ll recently found evidence of correla- 
tions between domestic airline volume and the transnational 
spreading time of influenza. We think that at this point the 
convergence of different methods is critical, and in this paper 
we propose another method which requires little data manip- 
ulation and filtering. In this paper, we discuss our results in 
the light of recent studies fTBIfTrjl and bring our perspective to 
issues such as the existence of preferred channels of transmis- 
sion and the impact of travel restrictions. We have developed 
a robust and relatively simple method of detecting correlations 
between different areas-"robust spatial patterns"-in empirical 
data on disease dynamics. In this paper, we illustrate and ap- 
ply this method to the well documented cases of influenza dy- 
namics in the United States and France. Our goal was to de- 
tect genuine correlations between different regions and to use 
as few assumptions as possible. The problem originates in 
the fact that a correlation coefficient usually aggregates differ- 
ent phenomena. In addition, large correlation coefficients can 
arise from external large-scale environmental constraints but 
do not reflect the existence of actual correlations due to other 
factors such as human movements. In order to characterize 
the level of correlation in a particular system under study, we 
need a reference (or "null") model which gives us the corre- 
sponding value of the correlation coefficient in the absence 
of transportation flows. In this paper, we propose a method 
of obtaining such a null model and apply it to the dynamics 
of influenza transmission in the United States and France. In 
both cases, we find robust transmission channels for epidemic 
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spread. In a second step, we try to relate the existence of such 
channels to transportation flows. 

I. MATERIAL AND METHODS 
A. Data sets 

We analyzed interpandemic influenza epidemics occurring 
during the periods 1972-2002 for the United States and 1984- 
2004 for France. In both cases, we defined an influenza "epi- 
demic period" as a year running from September to Septem- 
ber, in order not to truncate the epidemic season. For the 
United States, we used weekly state-specific mortality rates 
for pneumonia and influenza collected by the Centers for Dis- 
ease Control and Prevention, restricted to the 48 contiguous 
states and the District of Columbia. According to Viboud et 
al. Ifl5ll and Greene et al. [28 1, the weekly time series of pneu- 
monia and influenza mortality appear to be useful indicators 
of the time evolution of spatial spread and, interpreted cau- 
tiously, incidence within each state. For France, we used esti- 
mates of the daily incidence of influenza- like illness in each 
region, and we restricted our study to the 21 continental re- 
gions of France. Those estimates were based on data collected 
by the Sentinel Network [21], a network of general practi- 
tioners distributed throughout the entire French territory. The 
estimates were in agreement with drug-sales data, which con- 
firms their relevance [30]. We investigated a possible relation 
between our indicator and several types of transportation flow. 
Data on yearly air traffic and commuter volumes in the United 
States, by state, were obtained from the Bureau of Transporta- 
tion Statistics (unpublished data; http://www.bts.gov) and the 
Census Bureau [31 1. Data on French interregional train-traffic 
volume per year were obtained from the French National Rail- 
way Service (unpublished data; http://www.sncf.fr), and in- 
terregional automobile traffic volume was based on 2001 data 
obtained from the Service d' Etudes techniques des Routes et 
Autoroutes (unpublished data; http://www.setra.equipement. 
gouv.fr). We also took into account geographic distances be- 
tween states or regions, approximated by using the distances 
between state capitals for the United States and the distances 
between regional prefectures for France. To investigate the 
possible effect of climate, we used weekly temperature data 
for US states for the period 1995-2006, obtained from the Na- 
tional Weather Service l32l . 



B. Methods 

We investigated separately in the two data sets a correlation 
coefficient-based indicator which enabled us to assess the im- 
portance of travel flows. For the United States, results were 
computed over 30 years of weekly data, and for France they 
were computed over 20 years of daily data. For all pairs of 
areas i and j, we computed the usual Pearson correlation co- 
efficient Tij for incidence over each epidemic period (see Ap- 
pendix). The value of this coefficient is usually difficult to 
interpret, however, and needs a comparison value. Thus, we 



propose below a simple way to cancel correlations in real data 
(uncorrected model). We also estimate the maximal correla- 
tion that one can observe in the system under study. These 
values, obtained for the uncorrelated and maximally corre- 
lated models, define an interval which allows for a quantitative 
estimate of the correlations existing between different areas. 

Uncorrelated model. If the spread of a disease in a coun- 
try is essentially due to the travel of infected persons between 
cities, there will be a particular time ordering of the activity 
profiles for different areas. The extreme case corresponds to 
carriers who can make only short-range displacements, lead- 
ing to a spatial diffusion phenomenon with a well-defined epi- 
demic front propagating at a given velocity (which was the 
case, for example, for the Black Death ll33l [34ll ). In this 
case, an outbreak appears at a time that is directly related 
to the distance from the initial infectious seed. In modern 
societies, people can travel over large distances, which dra- 
matically modifies the spatial diffusion of the disease and its 
simple propagating front; however, there should still be a pat- 
tern. Indeed, if an infected person carries the disease from 
an infected area A to a noninfected area B and transmits the 
disease in area B, we will observe a time difference between 
the epidemic peaks in the two areas. We can expect that the 
larger the flow between areas A and B, the greater the sim- 
ilarity between the epidemic profiles in the two areas. The 
time ordering and the correlation between the epidemic pro- 
files in different regions is thus a signature of the flow of in- 
fected individuals. Unfortunately, epidemic activities usually 
occur over a short period of time in all regions of a coun- 
try, and a large observed correlation could result from this 
short-period constraint without producing significant informa- 
tion about transmission channels. The aim of the natural null 
model is to eliminate the large correlation value occurring by 
chance. Therefore, it corresponds to a random time shift of the 
activity profiles of the different regions. If the values of the 
actual correlation coefficient are then significantly larger than 
the ones obtained with the null model corresponding to epi- 
demics emerging at random times, it is the signature of gen- 
uine correlations induced by the displacement of individuals 
from one region to another. For every epidemic period (from 
September to September), we define the "epidemic activity 
time range", which contains all peaks for all regions or states. 
We then shift, by a random amount drawn from a uniform 
probability, the whole epidemic profile of a region or state, 
such that the epidemic profile still belongs to the epidemic ac- 
tivity time range (figure [T] parts A and B). The next step is 
to compute the Pearson correlation coefficients for all pairs 
of areas-for a given year-and for a large number of random 
shifts. We finally obtain the average correlation coefficient 
niij between all pairs of areas i and j for the uncorrelated (or 
"null") model. 

Maximal correlation. The correlation between two areas 
will be maximal for very similar "synchronized" activity pro- 
files with activity peaks reached essentially at the same time. 
In order to obtain this maximal value for the correlation coef- 
ficient for a given pair of areas and a given year, we compute 
the cross-correlation of the epidemic profiles of the two ar- 
eas. When the maximum correlation coefficient is found, we 
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Figure 1 : Model of maximal correlation and uncorrelated epidemic 
disease spread. A) Weekly rate of mortality from pneumonia and in- 
fluenza (P&I) per 100,000 for two different US states. The letters a 
and b and the dotted lines indicate the corresponding epidemic peaks. 
B) Random reassignment of the peak moments, resulting in a shift of 
the different epidemic profiles by randomly chosen amounts dl and 
d2. C) The specific shift that gives the highest correlation coefficient 
for the pair of epidemic peaks and corresponds to a synchronization 
of the peaks. For parts B and C, the dashed lines indicate the "epi- 
demic activity time range." 



store it and reapply the method to the next pair of areas. Parts 
A and C of figure [T] illustrate the method and show epidemic 



profiles with the shift that produces the maximum correlation 
coefficient (corresponding to epidemic peak synchronization 
as expected). The output is a matrix My of maximal possible 
correlation coefficients for the pair of areas for a given 
year. 

A correlation coefficient-based indicator. The uncorrelated 
model gives the value of the correlation coefficient in the ab- 
sence of time correlations in the peak value of the epidemic 
period, while the maximal value is obtained when the peaks 
are synchronized. We can combine these different values in 
order to obtain a parameter Xy (t) between areas i and j for 
year t: 



Xij(t) 



n 3 (t)-m lj (t) 
Mij - rriij(t) 



where we recall that ry, my (t), and My(i) denote the corre- 
lation coefficients obtained previously (calculated for year t). 
The coefficient X is thus bounded by 1 by construction, and 
since time reshuffling cancels essentially time-ordering cor- 
relations, poorly correlated and anticorrelated states can exist 
and have ry < my, leading to negative values of X. Those 
values can even be very low when My — my << 1. Below, 
we analyze these quantities X for every year. 

Robust patterns. We are looking for spatial patterns due 
to persistent factors which do not vary significantly from one 
year to another. We are thus interested in pairs of areas show- 
ing both a high value of Jfy and a low dispersion around this 
average, which indicates a regularity in the large values of X 
and hence a robustness in the bond between the correspond- 
ing areas. As we will see, most links have a relatively large 
average value (X), and the low level of fluctuation is simply 
characterized by the inverse coefficient of variation (CV) of 
X^, defined as 



CF(Xy) 



A low standard deviation will then be associated with a high 
value of l/CV{X ij ). 

Spatial autocorrelation. In order to determine the pres- 
ence of spatial correlations in our indicator, we use Moran's 
/, which is a weighted correlation coefficient for X where the 
weights depend on the distance h between two regions. In our 
results we use binary weights, where 1 is attributed to pairs of 
states at a distance between h and h + a and otherwise. 



II. RESULTS 

In figure |2j we provide an example of average values of X 
over the period 1972-2002 between California and other US 
states. As we can see, most connections have a large average 
correlation, and the main point of interest in figure [2]concerns 
the heterogeneity of confidence intervals, which give infor- 
mation about the stability of the bond between two areas from 
year to year. Indeed, one can observe that some bonds have a 
highly fluctuating correlation (for Wisconsin, 95 percent con- 
fidence interval: 0.46, 0.92), while others display a remark- 
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Figure 2: Average correlations of epidemic profiles between Cali- 
fornia (y-axis) and the other 48 US states (47 contiguous states plus 
the District of Columbia) (x-axis), computed for 30 epidemic periods 
(1972-2002). Data are ranked from the highest value to the lowest. 
Bars, 95% confidence interval. 



able robustness (for Arizona, 95 percent confidence interval: 
0.84, 0.94), which is the signature of a stable recurring pattern. 
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Figure 3: Spatial autocorrelation of X. The graph shows Moran's I 
computed over X, plotted as a function of the distance between states. 
Pairs of states are grouped by the distance between them, within a 
300-km band. The bold line shows the evolution of the correlation 
measure for X averaged over 30 years (1972-2002). The thinner 
lines stand for X averaged over three different decades. The curves 
all display the same trends, showing stability of the spreading pattern 
over time. 



X averaged over all years in our data set (1972-2002) or aver- 
aged over three different decades. The spatial autocorrelation 
reveals correlation clusters at both short and long distances. 
Figure [3] also shows that our results are stable over time. 




Figure 4: Correlations of epidemic profiles for three US states (de- 
noted as "infected state"). The various shades of gray stand for val- 
ues of l/CV(Xij), where i corresponds to California, Illinois, or 
New York (top to bottom) and j to all of the other states. In each 
map, black stands for the considered state (California, Illinois, or 
New York). We observe high values of 1 / CV (Xij) for neighboring 
states (Arizona for California; Indiana, Kentucky, and Ohio for Illi- 
nois; New lersey, Pennsylvania, and Massachusetts for New York) 
and long-range connections between California and Texas, Illinois 
and New York, and New York and California. 



Figure [4] displays the results for the correlation 
(l/CV(Xijj) on maps for three different states. In the 
three situations, we observe high l/CV(-Xy) values for 
neighboring states, as well as long-range bonds. This method 
can also be used on the smaller scale of French regions. We 
also observe two types of strong connections here (figure B). 
Regions are strongly correlated with their neighbors, and we 
observe long-range strong connections, as in the case of the 
United States. 



A. Relation with transportation data 



We also must consider spatial autocorrelation analysis on 
the X indicator. We aggregate data for all of the states and 
show, in figure [3] the evolution of Moran?s I with different 
threshold distances h. In the same plot, we show the results for 



In figures [6] and [7] we plot the quantity 1/CV(X) against 
transportation traffic data to test its relation to transportation 
flows. In figure[6j we plot 1 / CV(X) against the interstate US 
air-traffic flow. The linear fit computed has a coefficient of de- 
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Figure 5: Correlations of epidemic profiles for three regions in France (denoted as "infected region"). The various shades of gray stand for 
values of 1/ CV(Xij), where i corresponds to Bretagne, Ile-de-France, or Rhone-Alpes (left to right) and j to all of the other regions. In each 
map, black stands for the considered region (Bretagne, Ile-de-France, or Rhone-Alpes). 



termination of 0.74 and thus supports evidence that 1 / CV(X) 
is proportional to the air-traffic flow (note that we have not 
normalized the traffic with respect to the population). This 
plot supports the claim that l/CV (X) indicates that the main 
vector for the spread of an epidemic in the United States is 
domestic air traffic. In order to assess the specific contribu- 
tion to our indicator of air traffic as compared with other pa- 
rameters, such as temperature or geographic distance, we per- 
formed a multivariate regression analysis using a linear model. 
For temperature, we used the Pearson correlations of weekly 
reported temperatures between states. We used state temper- 
ature profiles covering 1 1 years to control for possible arti- 
facts. The results of this analysis (table |H"A] i suggest that in- 
terstate air traffic makes a greater contribution than distance or 
temperature (estimates were 0.407, 0.096, and 0.220, respec- 
tively). Although the model does not explain the total variance 
(r 2 = 0.27), it is statistically significant (p < 0.001) and sug- 
gests that air travel is the dominant factor among those factors 
explored. 

As recent studies have shown lfT31[35l . travel to work BTl 
plays a predominant role in disease spread, and thus we es- 
timated its impact on our results. The volume of interstate 
commuting is very low (representing approximately 3 percent 
of the total number of US commuters) and decreases quickly 
with distance, so we needed to consider only pairs of states 
with common borders. Through a multivariate regression on 
1/CV(X), we compared the impacts on epidemic spread of 
air travel and commuting using ground modes of transporta- 
tion. Air traffic made a larger contribution in the regression 
model (table[IIH>, and we can thus conclude that it is the dom- 
inant transportation mode, even between neighboring states. 
On a smaller scale, the results for France were more con- 
trasted. As figure [7] shows, the behavior of \/CV{X) in- 
creased with train traffic, but we also performed a linear multi- 
variate regression analysis of 1/CV(X) as a function of train 
traffic, automobile traffic, and geographic distance. Our aim 
was not to obtain a fully explicative model but to test whether 
one of these factors would dominate the others. The results 
(table |Hl) i show that distance makes little contribution to the 
model and that automobile and train traffic have essentially 
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Figure 6: Correlation indicator for air transportation flow in the 
United States. The graph shows the inverse coefficient of variation 
(CV) of X as a function of domestic air transportation flow for all 
pairs of states (binned data). Since the range of variation of the traf- 
fic was very wide, the log-linear plot is shown. Air transportation 
was measured as the number of passengers traveling by plane be- 
tween pairs of states in 2000. The curve is a linear fit of the equation 
l/CV(X) = A * flow + B, where A = 2.5 * 10~ 7 and B = 0.91 
(r 2 = 0.74). 



the same weights (0.092, 0.265, and 0.204, respectively). As 
in the US case, the model does not explain all of the vari- 
ance (r 2 = 0.22), but the result is statistically significant 
(p < 0.001). 



III. DISCUSSION 

We have presented a method that aims to identify strongly 
connected areas from raw epidemic data. Our main goal is 
to identify preferred spatial paths-or epidemic pathways- for 
the spread of infectious diseases. If these paths exist, they 
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Figure 7: Correlation indicator for train traffic in France. The graph 
shows the inverse coefficient of variation (CV) of X as a function of 
train transportation for all pairs of regions (binned data). Since the 
range of variation of the traffic was very wide, the log-linear plot is 
shown. Train transportation was measured as thousands of passen- 
gers traveling by train between pairs of regions in 2001. The curve 
is a linear fit of the equation 1/CV(X) = A * flow + B, where 
A = 4.6 * 10~ 5 , B = 3.9 (r 2 = 0.52). 



Estimate 


Intercept 





A: Air traffic volume 


0.407 


B: Distance between states 


-0.096* 


C: Temperature correlation 


0.220* 



*p < 0.001. 



Table I: The linear regression equation takes the form X = bi + 62 * 
A + 63 * B + 64 * C, where A, B, and C stand for standardized 
air traffic volume, distances between states, and correlation of state 
temperatures, respectively, and bj are the estimates given in the table 
(r 2 = 0.27, p < 0.001). 

have an effect on the spatiotemporal pattern of reported cases 
of disease; consequently, we should be able to identify them 
from the temporal evolution of local influenza incidence. An 
important fact is the need to discriminate what is due to in- 
herent noise, spatial effects, and other constraints in the data. 
In order to achieve this goal, we define and use an uncor- 
rected model which cancels the existing correlations due to 
transportation flows. The maximal correlation gives the upper 
bound of the possible correlation that would exist between two 
areas if they were perfectly synchronized. These lower and 
upper bounds enable us to assess the level of genuine correla- 
tion due to transportation flow between regions. By observing 
data from several years, we can analyze the evolution of the 
level of correlation between areas and detect the robustness of 
patterns over time. The spatial autocorrelation analysis showed 
that the behavior of our indicator is stable over time. Results 
were consistent from one set of years to another, despite pos- 
sible environmental evolution. This stability, particularly with 



respect to the increase in air-travel flow (on the order of 300 
percent between 1972 and 2002), seems to indicate that for 
some time now, air-travel flows have been large enough to 
propagate an epidemic throughout the United States. This im- 
plies that in order to be efficient, travel restrictions should be 
so drastic that they are economically unreasonable- a finding 
that agrees with other recent results [5, 15 1. The spatial regres- 
sion analysis also revealed the existence of geographic clus- 
ters of strongly correlated neighbor states at short distances 
(<600 km) and other clusters at long distances (>3,000 km). 
We cannot relate these results directly to the smaller scale (the 
intercounty level) studied by Viboud et al. lfT5ll . since we 
were using aggregated data, but as expected, our results were 
in agreement with those of that study at the larger interstate 
scale. 



Estimate 


Intercept 





A: Air traffic 


0.341 


B: Ground commuting 


0.068 



*p < 0.001. 



Table II: The linear regression equation takes the form X = 61 + 
62 * A + 63 * B + 64 * C, where A, B, and C stand for air traffic 
and commuting between states, respectively, and bj are the estimates 
given in the table (r 2 = 0.12, p < 0.001). 

The next step is to interpret these patterns in terms of so- 
cial, geographic, or other epidemiologic data. In the case of 
the United States, we were able to relate the existence of per- 
sistent channels of transmission to interstate air transportation 
flows, while other factors such as climate seemed less impor- 
tant. We also showed that even between neighboring states, 
air travel is dominant over commuting using ground modes 
of transportation. Commuters very likely play a role in the 
spread of epidemics at the county level, and further investiga- 
tion at this smaller scale is needed. More generally, such an 
interpretation might not be particularly feasible because of the 
mixing of different modes of transportation. Indeed, the case 
of France, which is of interest for its smaller scale, highlights 
the weakness of a hypothesis pertaining to a single mode of 
transportation. An explanation for this might be that different 
modes of travel (train, automobile, and plane) compete at this 
scale and there is no clearly dominant transportation mode, a 
result which is supported by our multivariate analysis. The 
existence of a large l/CV(X), however, reveals the existence 
of strongly connected regions. The lack of clear correlations 
between large values of 1/CV(X) and large traffic volumes 
does not affect the quality of the indicator 1/CV(X) but sim- 
ply reflects a more mixed situation concerning transportation 
modes used in the influenza epidemic process. There are dif- 
ferent limitations to this work. In particular, pneumonia and 
influenza mortality (for the United States) or influenza-like ill- 
ness (for France) are just proxies for laboratory-confirmed in- 
fluenza. Moreover, we did not take into account the change in 
influenza strains from one year to another. However, these fac- 
tors are unlikely to have affected our conclusions about robust 
transmission channels. Another possibly important limitation 
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concerns the fact that all studies (including ours) are limited 
to the spread of disease inside a given country and neglect the 
exchange of disease with other countries. The correspond- 
ing flows are usually far from negligible, and we think their 
importance in national spread should be assessed in future 
studies. Our heavy use of retrospective data to compensate 



Estimate 


Intercept 





A: Distance between regions 


-0.092t 


B: Automobile traffic volume 


0.265* 


C: Train traffic volume 


0.204* 



*p < 0.001, \p < 0.1. 



Table III: The linear regression equation takes the form X — bi + 
62 * A + 63 * B + 64 * C, where A, B, and C stand for standardized, 
distances between regions, automobile traffic volume, and train traf- 
fic volume, respectively, and bj are the estimates given in the table 
(r 2 = 0.22, p < 0.001). 

for data fluctuations did not allow us to analyze and interpret 
single-year fluctuation as shown in the paper by Brownstein 
et al. [16|. Basically, those authors chose a different trade- 
off than ours and decided to aggregate their data in a few US 



geographic regions, whereas we decided to keep a statewise 
approach and use as many years as possible. Our results seem 
to suggest that for the United States, realistic modeling of the 
spread of epidemics at the interstate level may only need to 
take air transportation into account. They also seem to imply 
that in the case of France, a realistic model would need to in- 
clude several transportation modes. In summary, we believe 
that this simple and robust method, which detects important 
channels of disease transmission, could be helpful in mod- 
eling the spread of epidemics and in assessing containment 
strategies that rely on travel restrictions. 
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APPENDIX 



For all pairs of areas i and j, we compute the usual Pear- 
son correlation coefficient for disease incidence over each epi- 
demic period (composed of n time steps), as 

= {xi{t)-)xi{){xj{t) - (xjY) 

where Xi (t) and Xj (t) are the estimated incidences for areas i 
and j at time step t, (xi) and (xj) are their averages, and Si 
and Sj are their standard deviations. 



